Linguistically Annotated Reordering: Evaluation and Analysis
نویسندگان
چکیده
Linguistic knowledge plays an important role on phrase movement in statistical machine translation. To efficiently incorporate linguistic knowledge into phrase reordering, we propose a new approach: Linguistically Annotated Reordering (LAR). In LAR, we build hard hierarchical skeletons and inject soft linguistic knowledge from source parse trees to nodes of hard skeletons during translation. The experimental results on large-scale training data show that LAR is comparable with boundary word based reordering (BWR) (Xiong, Liu, and Lin 2006) which is a very competitive lexicalized reordering approach. When combined with BWR, LAR provides complementary information for phrase reordering, which collectively improves the BLEU score significantly. To further understand the contribution of linguistic knowledge in LAR to phrase reordering, we introduce a syntax-based analysis method to automatically detect constituent movement in both reference and system translations, and summarize syntactic reordering patterns that are captured by reordering models. With the proposed analysis method, we conduct a comparative analysis that not only provides the insight into how linguistic knowledge affects phrase movement but also reveals new challenges in phrase reordering.
منابع مشابه
Linguistically Annotated BTG for Statistical Machine Translation
Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys linguistic knowledge of source-side syntax structures to BTG hierarchical structures through linguistic annotation. From the linguistically annotated da...
متن کاملA Linguistically Annotated Reordering Model for BTG-based Statistical Machine Translation
In this paper, we propose a linguistically annotated reordering model for BTG-based statistical machine translation. The model incorporates linguistic knowledge to predict orders for both syntactic and non-syntactic phrases. The linguistic knowledge is automatically learned from source-side parse trees through an annotation algorithm. We empirically demonstrate that the proposed model leads to ...
متن کاملEffects of Parsing Errors on Pre-Reordering Performance for Chinese-to-Japanese SMT
Linguistically motivated reordering methods have been developed to improve word alignment especially for Statistical Machine Translation (SMT) on long distance language pairs. However, since they highly rely on the parsing accuracy, it is useful to explore the relationship between parsing and reordering. For Chinese-toJapanese SMT, we carry out a three-stage incremental comparative analysis to ...
متن کاملClause-Based Reordering Constraints to Improve Statistical Machine Translation
We demonstrate that statistical machine translation (SMT) can be improved substantially by imposing clause-based reordering constraints during decoding. Our analysis of clause-wise translation of different types of clauses shows that it is beneficial to apply these constraints for finite clauses, but not for non-finite clauses. In our experiments in English-Hindi translation with an SMT system ...
متن کاملThe SETimes.HR Linguistically Annotated Corpus of Croatian
We present SETIMES.HR— the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETIMES parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities and dependency syntax. We couple the corpus with domain-sensitive test sets for Croatian and Se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Linguistics
دوره 36 شماره
صفحات -
تاریخ انتشار 2010